AITopics | sparse feature

Collaborating Authors

sparse feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How does My Model Fail? Automatic Identification and Interpretation of Physical Plausibility Failure Modes with Matryoshka Transcoders

Tang, Yiming, Sinha, Abhijeet, Liu, Dianbo

arXiv.org Artificial IntelligenceNov-19-2025

Although recent generative models are remarkably capable of producing instruction-following and realistic outputs, they remain prone to notable physical plausibility failures. Though critical in applications, these physical plausibility errors often escape detection by existing evaluation methods. Furthermore, no framework exists for automatically identifying and interpreting specific physical error patterns in natural language, preventing targeted model improvements. W e introduce Matryoshka Transcoders, a novel framework for the automatic discovery and interpretation of physical plausibility features in generative models. Our approach extends the Matryoshka representation learning paradigm to transcoder architectures, enabling hierarchical sparse feature learning at multiple granularity levels. By training on intermediate representations from a physical plausibility classifier and leveraging large multimodal models for interpretation, our method identifies diverse physics-related failure modes without manual feature engineering, achieving superior feature relevance and feature accuracy compared to existing approaches. W e utilize the discovered visual patterns to establish a benchmark for evaluating physical plausibility in generative models. Our analysis of eight state-of-the-art generative models provides valuable insights into how these models fail to follow physical constraints, paving the way for further model improvements.

large language model, machine learning, matryoshka transcoder, (15 more...)

arXiv.org Artificial Intelligence

2511.10094

Country: Asia (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.99)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Adaptive Regularization for Large-Scale Sparse Feature Embedding Models

Li, Mang, Lyu, Wei

arXiv.org Machine LearningNov-11-2025

The one-epoch overfitting problem has drawn widespread attention, especially in CTR and CVR estimation models in search, advertising, and recommendation domains. These models which rely heavily on large-scale sparse categorical features, often suffer a significant decline in performance when trained for multiple epochs. Although recent studies have proposed heuristic solutions, the fundamental cause of this phenomenon remains unclear. In this work, we present a theoretical explanation grounded in Rademacher complexity, supported by empirical experiments, to explain why overfitting occurs in models with large-scale sparse categorical features. Based on this analysis, we propose a regularization method that constrains the norm budget of embedding layers adaptively. Our approach not only prevents the severe performance degradation observed during multi-epoch training, but also improves model performance within a single epoch. This method has already been deployed in online production systems. Click-through rate (CTR) and conversion rate (CVR) estimation are critical for advertising, search and recommendation (ASR) applications. E-commerce platforms like Amazon and Taobao rely on optimizing CTR and CVR estimation to boost gross merchandise volume (GMV), while advertising platforms at Google and Meta depend on it to drive revenue growth.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

2511.06374

Country:

Asia > China > Shandong Province > Dongying (0.04)
North America > Canada > British Columbia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Visual Sparse Steering: Improving Zero-shot Image Classification with Sparsity Guided Steering Vectors

Chatzoudis, Gerasimos, Li, Zhuowei, Moran, Gemma E., Wang, Hao, Metaxas, Dimitris N.

arXiv.org Artificial IntelligenceJun-3-2025

Steering vision foundation models at inference time without retraining or access to large labeled datasets is a desirable yet challenging objective, particularly in dynamic or resource-constrained settings. In this paper, we introduce Visual Sparse Steering (VS2), a lightweight, test-time method that guides vision models using steering vectors derived from sparse features learned by top-$k$ Sparse Autoencoders without requiring contrastive data. Specifically, VS2 surpasses zero-shot CLIP by 4.12% on CIFAR-100, 1.08% on CUB-200, and 1.84% on Tiny-ImageNet. We further propose VS2++, a retrieval-augmented variant that selectively amplifies relevant sparse features using pseudo-labeled neighbors at inference time. With oracle positive/negative sets, VS2++ achieves absolute top-1 gains over CLIP zero-shot of up to 21.44% on CIFAR-100, 7.08% on CUB-200, and 20.47% on Tiny-ImageNet. Interestingly, VS2 and VS2++ raise per-class accuracy by up to 25% and 38%, respectively, showing that sparse steering benefits specific classes by disambiguating visually or taxonomically proximate categories rather than providing a uniform boost. Finally, to better align the sparse features learned through the SAE reconstruction task with those relevant for downstream performance, we propose Prototype-Aligned Sparse Steering (PASS). By incorporating a prototype-alignment loss during SAE training, using labels only during training while remaining fully test-time unsupervised, PASS consistently, though modestly, outperforms VS2, achieving a 6.12% gain over VS2 only on CIFAR-100 with ViT-B/32.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.01247

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multi-Tenant SmartNICs for In-Network Preprocessing of Recommender Systems

Zhu, Yu, Jiang, Wenqi, Alonso, Gustavo

arXiv.org Artificial IntelligenceJan-24-2025

Keeping ML-based recommender models up-to-date as data drifts and evolves is essential to maintain accuracy. As a result, online data preprocessing plays an increasingly important role in serving recommender systems. Existing solutions employ multiple CPU workers to saturate the input bandwidth of a single training node. Such an approach results in high deployment costs and energy consumption. For instance, a recent report from industrial deployments shows that data storage and ingestion pipelines can account for over 60\% of the power consumption in a recommender system. In this paper, we tackle the issue from a hardware perspective by introducing Piper, a flexible and network-attached accelerator that executes data loading and preprocessing pipelines in a streaming fashion. As part of the design, we define MiniPipe, the smallest pipeline unit enabling multi-pipeline implementation by executing various data preprocessing tasks across the single board, giving Piper the ability to be reconfigured at runtime. Our results, using publicly released commercial pipelines, show that Piper, prototyped on a power-efficient FPGA, achieves a 39$\sim$105$\times$ speedup over a server-grade, 128-core CPU and 3$\sim$17$\times$ speedup over GPUs like RTX 3090 and A100 in multiple pipelines. The experimental analysis demonstrates that Piper provides advantages in both latency and energy efficiency for preprocessing tasks in recommender systems, providing an alternative design point for systems that today are in very high demand.

artificial intelligence, machine learning, pipeline, (19 more...)

arXiv.org Artificial Intelligence

2501.12032

Genre: Research Report > New Finding (0.87)

Industry:

Information Technology > Services (1.00)
Energy > Oil & Gas > Midstream (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Interpretable Company Similarity with Sparse Autoencoders

Molinari, Marco, Shao, Victor, Tregubiak, Vladimir, Pandey, Abhimanyu, Mikolajczak, Mateusz, Pereira, Sebastian Kuznetsov Ryder Torres

arXiv.org Artificial IntelligenceDec-10-2024

Determining company similarity is a vital task in finance, underpinning hedging, risk management, portfolio diversification, and more. Practitioners often rely on sector and industry classifications to gauge similarity, such as SIC-codes and GICS-codes - the former being used by the U.S. Securities and Exchange Commission (SEC), and the latter widely used by the investment community. Since these classifications can lack granularity and often need to be updated, using clusters of embeddings of company descriptions has been proposed as a potential alternative, but the lack of interpretability in token embeddings poses a significant barrier to adoption in high-stakes contexts. Sparse Autoencoders (SAEs) have shown promise in enhancing the interpretability of Large Language Models (LLMs) by decomposing LLM activations into interpretable features. We apply SAEs to company descriptions, obtaining meaningful clusters of equities in the process. We benchmark SAE features against SIC-codes, Major Group codes, and Embeddings. Our results demonstrate that SAE features not only replicate but often surpass sector classifications and embeddings in capturing fundamental company characteristics. This is evidenced by their superior performance in correlating monthly returns - a proxy for similarity - and generating higher Sharpe ratio co-integration strategies, which underscores deeper fundamental similarities among companies.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.02605

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Virginia (0.04)
North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Banking & Finance > Trading (1.00)
Government > Regional Government > North America Government > United States Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Reviews: Learning Graph Representations with Embedding Propagation

Neural Information Processing SystemsOct-8-2024, 11:12:53 GMT

The authors introduce embedding propagation (EP), a new message-passing method for learning representations of attributed vertices in graphs. EP computes vector representations of nodes from the'labels' (sparse features) associated with nodes and their neighborhood. The learning of these representations is facilitated by two different types of messages sent along edges: a'forward' message that sends the current representation of the node, and a'backward' message that passes back the gradients of some differentiable reconstruction loss. The authors report results that are competitive with or outperform baseline representation learning methods such as deepwalk and node2vec. Quality: The quality of the paper is high.

embedding propagation, learning graph representation, representation, (9 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.57)

Add feedback

Efficient Tabular Data Preprocessing of ML Pipelines

Zhu, Yu, Jiang, Wenqi, Alonso, Gustavo

arXiv.org Artificial IntelligenceSep-23-2024

Data preprocessing pipelines, which includes data decoding, cleaning, and transforming, are a crucial component of Machine Learning (ML) training. Thy are computationally intensive and often become a major bottleneck, due to the increasing performance gap between the CPUs used for preprocessing and the GPUs used for model training. Recent studies show that a significant number of CPUs across several machines are required to achieve sufficient throughput to saturate the GPUs, leading to increased resource and energy consumption. When the pipeline involves vocabulary generation, the preprocessing performance scales poorly due to significant row-wise synchronization overhead between different CPU cores and servers. To address this limitation, in this paper we present the design of Piper, a hardware accelerator for tabular data preprocessing, prototype it on FPGAs, and demonstrate its potential for training pipelines of commercial recommender systems. Piper achieves 4.7 $\sim$ 71.3$\times$ speedup in latency over a 128-core CPU server and outperforms a data-center GPU by 4.8$\sim$ 20.3$\times$ when using binary input. The impressive performance showcases Piper's potential to increase the efficiency of data preprocessing pipelines and significantly reduce their resource consumption.

dataset, pipeline, piper, (15 more...)

arXiv.org Artificial Intelligence

2409.14912

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks

Li, Jianfei, Feng, Han, Zhou, Ding-Xuan

arXiv.org Artificial IntelligenceAug-10-2024

In this work, we explore the intersection of sparse coding theory and deep learning to enhance our understanding of feature extraction capabilities in advanced neural network architectures. We begin by introducing a novel class of Deep Sparse Coding (DSC) models and establish a thorough theoretical analysis of their uniqueness and stability properties. By applying iterative algorithms to these DSC models, we derive convergence rates for convolutional neural networks (CNNs) in their ability to extract sparse features. This provides a strong theoretical foundation for the use of CNNs in sparse feature learning tasks. We additionally extend this convergence analysis to more general neural network architectures, including those with diverse activation functions, as well as self-attention and transformer-based models. This broadens the applicability of our findings to a wide range of deep learning methods for deep sparse feature extraction. Inspired by the strong connection between sparse coding and CNNs, we also explore training strategies to encourage neural networks to learn more sparse features. Through numerical experiments, we demonstrate the effectiveness of these approaches, providing valuable insights for the design of efficient and interpretable deep learning models.

neural network, pkq, sparse, (14 more...)

arXiv.org Artificial Intelligence

2408.0554

Country:

Asia > China > Hong Kong (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models

Lee, Yunjae, Kim, Hyeseong, Rhu, Minsoo

arXiv.org Artificial IntelligenceJun-11-2024

Training recommendation systems (RecSys) faces several challenges as it requires the "data preprocessing" stage to preprocess an ample amount of raw data and feed them to the GPU for training in a seamless manner. To sustain high training throughput, state-of-the-art solutions reserve a large fleet of CPU servers for preprocessing which incurs substantial deployment cost and power consumption. Our characterization reveals that prior CPU-centric preprocessing is bottlenecked on feature generation and feature normalization operations as it fails to reap out the abundant inter-/intra-feature parallelism in RecSys preprocessing. PreSto is a storage-centric preprocessing system leveraging In-Storage Processing (ISP), which offloads the bottlenecked preprocessing operations to our ISP units. We show that PreSto outperforms the baseline CPU-centric system with a $9.6\times$ speedup in end-to-end preprocessing time, $4.3\times$ enhancement in cost-efficiency, and $11.3\times$ improvement in energyefficiency on average for production-scale RecSys preprocessing.

opération, presto, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2406.14571

Genre: Research Report (0.84)

Industry: Information Technology > Services (0.93)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Architecture (1.00)
(2 more...)

Add feedback

Bullion: A Column Store for Machine Learning

Liao, Gang, Liu, Ye, Chen, Jianjun, Abadi, Daniel J.

arXiv.org Artificial IntelligenceApr-13-2024

The past two decades have witnessed columnar storage revolutionizing data warehousing and analytics. However, the rapid growth of machine learning poses new challenges to this domain. This paper presents Bullion, a columnar storage system tailored for machine learning workloads. Bullion addresses the complexities of data compliance, optimizes the encoding of long sequence sparse features, efficiently manages wide-table projections, and introduces feature quantization in storage. By aligning with the evolving requirements of ML applications, Bullion extends columnar storage to various scenarios, from advertising and recommendation systems to the expanding realm of Generative AI. Preliminary experimental results and theoretical analysis demonstrate Bullion's superior performance in handling the unique demands of machine learning workloads compared to existing columnar storage solutions. Bullion significantly reduces I/O costs for deletion compliance, achieves substantial storage savings with its optimized encoding scheme for sparse features, and drastically improves metadata parsing speed for wide-table projections. These advancements position Bullion as a critical component in the future of machine learning infrastructure, enabling organizations to efficiently manage and process the massive volumes of data required for training and inference in modern AI applications.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.08901

Country:

North America > United States > Virginia (0.04)
North America > United States > California (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Law > Statutes (0.68)
Law > Civil Rights & Constitutional Law (0.68)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback